Render site pages

dpp runs the knesset data pipelines periodically on our server.

This notebook shows how to run pipelines that render pages for the static website at https://oknesset.org

Load the source data

Download the source data, can take a few minutes.


In [ ]:
!{'cd /pipelines; KNESSET_LOAD_FROM_URL=1 dpp run --concurrency 4 '\
  './committees/kns_committee,'\
  './people/committee-meeting-attendees,'\
  './members/mk_individual'}

Run the build pipeline

This pipeline aggregates the relevant data and allows to filter for quicker development cycles.

You can uncomment and modify the filter step in committees/dist/knesset.source-spec.yaml under the build pipeline to change the filter.

The build pipeline can take a few minutes to process for the first time.


In [2]:
!{'cd /pipelines; dpp run --verbose ./committees/dist/build'}


[./committees/dist/build:T_0] >>> INFO    :168911d3 RUNNING ./committees/dist/build
[./committees/dist/build:T_0] >>> INFO    :168911d3 Collecting dependencies
[./committees/dist/build:T_0] >>> INFO    :168911d3 Running async task
[./committees/dist/build:T_0] >>> INFO    :168911d3 Waiting for completion
[./committees/dist/build:T_0] >>> INFO    :168911d3 Async task starting
[./committees/dist/build:T_0] >>> INFO    :168911d3 Searching for existing caches
[./committees/dist/build:T_0] >>> INFO    :168911d3 Building process chain:
[./committees/dist/build:T_0] >>> INFO    :- load_resource
[./committees/dist/build:T_0] >>> INFO    :- knesset.load_large_csv_resource
[./committees/dist/build:T_0] >>> INFO    :- knesset.rename_resource
[./committees/dist/build:T_0] >>> INFO    :- load_resource
[./committees/dist/build:T_0] >>> INFO    :- filter
[./committees/dist/build:T_0] >>> INFO    :- build_meetings
[./committees/dist/build:T_0] >>> INFO    :- dump.to_path
[./committees/dist/build:T_0] >>> INFO    :- (sink)
[./committees/dist/build:T_0] >>> INFO    :load_resource: INFO    :Processed 756 rows
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /usr/local/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/load_resource.py
[./committees/dist/build:T_0] >>> INFO    :knesset.load_large_csv_resource: INFO    :Processed 1771 rows
[./committees/dist/build:T_0] >>> INFO    :knesset.rename_resource: INFO    :Processed 1771 rows
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /pipelines/datapackage_pipelines_knesset/processors/rename_resource.py
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /pipelines/datapackage_pipelines_knesset/processors/load_large_csv_resource.py
[./committees/dist/build:T_0] >>> INFO    :load_resource: INFO    :Processed 76185 rows
[./committees/dist/build:T_0] >>> INFO    :filter: INFO    :Processed 1865 rows
[./committees/dist/build:T_0] >>> INFO    :build_meetings: INFO    :Processed 1865 rows
[./committees/dist/build:T_0] >>> INFO    :dump.to_path: INFO    :Processed 1865 rows
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /usr/local/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/filter.py
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /usr/local/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/load_resource.py
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /pipelines/committees/dist/build_meetings.py
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /usr/local/lib/python3.6/site-packages/datapackage_pipelines/specs/../lib/dump/to_path.py
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE /usr/local/lib/python3.6/site-packages/datapackage_pipelines/manager/../lib/internal/sink.py
[./committees/dist/build:T_0] >>> INFO    :168911d3 DONE V ./committees/dist/build {'.dpp': {'out-datapackage-url': '../../data/committees/dist/build_meetings/datapackage.json'}, 'bytes': 7557637, 'committees': 756, 'count_of_rows': 1865, 'dataset_name': '_', 'hash': 'c68616bddaacb22cb62c85cb3b4015e8', 'meetings': 94, 'mks': 1015, 'skipped committees': 0, 'skipped meetings': 0, 'skipped mks': 0}
INFO    :RESULTS:
INFO    :SUCCESS: ./committees/dist/build {'bytes': 7557637, 'committees': 756, 'count_of_rows': 1865, 'dataset_name': '_', 'hash': 'c68616bddaacb22cb62c85cb3b4015e8', 'meetings': 94, 'mks': 1015, 'skipped committees': 0, 'skipped meetings': 0, 'skipped mks': 0}

Download some protocol files for rendering

upgrade to latest dataflows library


In [ ]:
!{'pip install --upgrade dataflows'}

Restart the kernel if an upgrade was done

Choose some session IDs to download protocol files for:


In [1]:
session_ids = [2063122, 2063126]

In [2]:
from dataflows import Flow, load, printer, filter_rows

sessions_data = Flow(
    load('/pipelines/data/committees/kns_committeesession/datapackage.json'),
    filter_rows(lambda row: row['CommitteeSessionID'] in session_ids),
    printer(tablefmt='html')
).results()


kns_committeesession

# CommitteeSessionID (integer) Number (integer) KnessetNum (integer) TypeID (integer)TypeDesc (string) CommitteeID (integer)Location (string) SessionUrl (string) BroadcastUrl (string) StartDate (datetime) FinishDate (datetime) Note (string) LastUpdatedDate (datetime) download_crc32c (string) download_filename (string) download_filesize (integer)parts_crc32c (string) parts_filesize (integer)parts_parsed_filename (string) text_crc32c (string) text_filesize (integer)text_parsed_filename (string) topics (array) committee_name (string)
120631222915161פתוחה2045חדר הוועדה, באגף הוועדות (קדמה), קומה 3, חדר 3710http://main.knesset.gov.il/Activity/committees/Pages/AllCommitteesAgenda.aspx?Tab=3&ItemID=2063122None2000-07-05 00:00:002000-07-05 00:00:00פניות ציבור בנושא איכות והתאמה לתקנים של שירותי הסעדה בבתי-הספר, פעוטונים, קייטנות ומוסדות ציבור 2018-10-10 11:03:06UCgupg==files/23/4/3/434231.DOC47154/4kpmQ==85239files/2/0/2063122.csvpybkkw==85134files/2/0/2063122.txtNoneהמיוחדת לפניות הציבור
220631263315161פתוחה2045חדר הוועדה, באגף הוועדות (קדמה), קומה 3, חדר 3710http://main.knesset.gov.il/Activity/committees/Pages/AllCommitteesAgenda.aspx?Tab=3&ItemID=2063126None2000-10-30 00:00:002000-10-30 00:00:00פניות של דיירי רחוב מאור הגולה בשכונת שפירא בתל-אביב שביתם נהרס והם ממשיכים לשלם משכנתא ולא מקבלים כ ...2018-10-10 11:03:06ryN9+g==files/23/4/3/434233.DOC36724qiGAHw==56525files/2/0/2063126.csv+Gw5Mw==56419files/2/0/2063126.txtNoneהמיוחדת לפניות הציבור

In [7]:
import os
import subprocess
import sys

for session in sessions_data[0][0]:
    for attr in ['text_parsed_filename', 'parts_parsed_filename']:
        pathpart = 'meeting_protocols_text' if attr == 'text_parsed_filename' else 'meeting_protocols_parts'
        url = 'https://production.oknesset.org/pipelines/data/committees/{}/{}'.format(pathpart, session[attr])
        filename = '/pipelines/data/committees/{}/{}'.format(pathpart, session[attr])
        os.makedirs(os.path.dirname(filename), exist_ok=True)
        cmd = 'curl -s -o {} {}'.format(filename, url)
        print(cmd, file=sys.stderr)
        subprocess.check_call(cmd, shell=True)


curl -s -o /pipelines/data/committees/meeting_protocols_text/files/2/0/2063122.txt https://production.oknesset.org/pipelines/data/committees/meeting_protocols_text/files/2/0/2063122.txt
curl -s -o /pipelines/data/committees/meeting_protocols_parts/files/2/0/2063122.csv https://production.oknesset.org/pipelines/data/committees/meeting_protocols_parts/files/2/0/2063122.csv
curl -s -o /pipelines/data/committees/meeting_protocols_text/files/2/0/2063126.txt https://production.oknesset.org/pipelines/data/committees/meeting_protocols_text/files/2/0/2063126.txt
curl -s -o /pipelines/data/committees/meeting_protocols_parts/files/2/0/2063126.csv https://production.oknesset.org/pipelines/data/committees/meeting_protocols_parts/files/2/0/2063126.csv

Delete dist hash files


In [8]:
%%bash
find /pipelines/data/committees/dist -type f -name '*.hash' -delete

Render pages

Should run the render pipelines in the following order:

Meetings:


In [9]:
!{'cd /pipelines; dpp run ./committees/dist/render_meetings'}


./committees/dist/render_meetings: WAITING FOR OUTPUT

./committees/dist/render_meetings: RUNNING, processed 94 rows

./committees/dist/render_meetings: SUCCESS, processed 94 rows
INFO    :RESULTS:
INFO    :SUCCESS: ./committees/dist/render_meetings {'bytes': 1742, 'count_of_rows': 94, 'dataset_name': '_', 'failed meetings': 0, 'hash': 'fb41c59fff6c4eced438aa6e29556b24', 'kns_committees': 756, 'meetings': 94, 'mk_individuals': 1015}

Rendered meetings stats


In [10]:
from dataflows import Flow, load, printer, filter_rows, add_field

def add_filenames():
    
    def _add_filenames(row):
        for ext in ['html', 'json']:
            row['rendered_'+ext] = '/pipelines/data/committees/dist/dist/meetings/{}/{}/{}.{}'.format(
                str(row['CommitteeSessionID'])[0], str(row['CommitteeSessionID'])[1], str(row['CommitteeSessionID']), ext)
    
    return Flow(
        add_field('rendered_html', 'string'),
        add_field('rendered_json', 'string'),
        _add_filenames
    )

rendered_meetings = Flow(
    load('/pipelines/data/committees/dist/rendered_meetings_stats/datapackage.json'), 
    add_filenames(),
    filter_rows(lambda row: row['CommitteeSessionID'] in session_ids),
    printer(tablefmt='html')
).results()[0][0]


meetings_stats

# CommitteeSessionID (integer) num_speech_parts (integer)hash (string) rendered_html (string) rendered_json (string)
12063122186None/pipelines/data/committees/dist/dist/meetings/2/0/2063122.html/pipelines/data/committees/dist/dist/meetings/2/0/2063122.json
22063126209None/pipelines/data/committees/dist/dist/meetings/2/0/2063126.html/pipelines/data/committees/dist/dist/meetings/2/0/2063126.json

Committees and homepage


In [13]:
!{'cd /pipelines; dpp run ./committees/dist/render_committees'}


./committees/dist/render_committees: WAITING FOR OUTPUT

./committees/dist/render_committees: WAITING FOR OUTPUT

./committees/dist/render_committees: SUCCESS, processed 0 rows
INFO    :RESULTS:
INFO    :SUCCESS: ./committees/dist/render_committees {'all chairpersons': 756, 'all committees': 756, 'all meeting stats': 94, 'all meetings': 94, 'all members': 7446, 'all mks': 1015, 'all others': 2, 'all replacements': 244, 'all watchers': 2, 'built index': 1, 'built_committees': 756, 'built_knesset_nums': 21, 'failed_committees': 0, 'failed_knesset_nums': 0}

Members / Factions


In [12]:
!{'cd /pipelines; dpp run ./committees/dist/create_members,./committees/dist/build_positions,./committees/dist/create_factions'}


./committees/dist/build_positions: WAITING FOR OUTPUT

./committees/dist/build_positions: RUNNING, processed 100 rows

./committees/dist/build_positions: RUNNING, processed 200 rows

./committees/dist/build_positions: RUNNING, processed 300 rows

./committees/dist/build_positions: RUNNING, processed 400 rows

./committees/dist/build_positions: RUNNING, processed 500 rows

./committees/dist/build_positions: RUNNING, processed 600 rows

./committees/dist/build_positions: RUNNING, processed 700 rows

./committees/dist/build_positions: RUNNING, processed 800 rows

./committees/dist/build_positions: RUNNING, processed 900 rows

./committees/dist/build_positions: RUNNING, processed 1000 rows

./committees/dist/build_positions: RUNNING, processed 1100 rows

./committees/dist/build_positions: RUNNING, processed 1200 rows

./committees/dist/build_positions: RUNNING, processed 1300 rows

./committees/dist/build_positions: RUNNING, processed 1400 rows

./committees/dist/build_positions: RUNNING, processed 1500 rows

./committees/dist/build_positions: RUNNING, processed 1600 rows

./committees/dist/build_positions: RUNNING, processed 1700 rows

./committees/dist/build_positions: RUNNING, processed 1800 rows

./committees/dist/build_positions: RUNNING, processed 1900 rows

./committees/dist/build_positions: RUNNING, processed 2000 rows

./committees/dist/build_positions: RUNNING, processed 2100 rows

./committees/dist/build_positions: RUNNING, processed 2144 rows

./committees/dist/build_positions: SUCCESS, processed 2144 rows

./committees/dist/build_positions: SUCCESS, processed 2144 rows
./committees/dist/create_members: WAITING FOR OUTPUT

./committees/dist/build_positions: SUCCESS, processed 2144 rows
./committees/dist/create_members: WAITING FOR OUTPUT

./committees/dist/build_positions: SUCCESS, processed 2144 rows
./committees/dist/create_members: SUCCESS, processed 0 rows

./committees/dist/build_positions: SUCCESS, processed 2144 rows
./committees/dist/create_members: SUCCESS, processed 0 rows
./committees/dist/create_factions: WAITING FOR OUTPUT

./committees/dist/build_positions: SUCCESS, processed 2144 rows
./committees/dist/create_members: SUCCESS, processed 0 rows
./committees/dist/create_factions: WAITING FOR OUTPUT

./committees/dist/build_positions: SUCCESS, processed 2144 rows
./committees/dist/create_members: SUCCESS, processed 0 rows
./committees/dist/create_factions: SUCCESS, processed 0 rows
INFO    :RESULTS:
INFO    :SUCCESS: ./committees/dist/build_positions {'bytes': 282211, 'count_of_rows': 2144, 'dataset_name': 'positions_aggr', 'hash': '0c318cd33a56a9fbb49f96172a462df0'}
INFO    :SUCCESS: ./committees/dist/create_members {}
INFO    :SUCCESS: ./committees/dist/create_factions {}

Showing the rendered pages

To serve the site, locate the correspondoing local directory for /pipelines/data/committees/dist/dist and run:

python -m http.server 8000

Pages should be available at http://localhost:8000/